{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "2ffc6a2b",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/low_level/oss_ingestion_retrieval.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dff7db9e-fbf9-4394-9958-35323799a4e3",
   "metadata": {},
   "source": [
    "# Building RAG from Scratch (Open-source only!) \n",
    "\n",
    "In this tutorial, we show you how to build a data ingestion pipeline into a vector database, and then build a retrieval pipeline from that vector database, from scratch.\n",
    "\n",
    "Notably, we use a fully open-source stack:\n",
    "\n",
    "- Sentence Transformers as the embedding model\n",
    "- Postgres as the vector store (we support many other [vector stores](https://gpt-index.readthedocs.io/en/stable/module_guides/storing/vector_stores.html) too!)\n",
    "- Llama 2 as the LLM (through [llama.cpp](https://github.com/ggerganov/llama.cpp))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "25764729-40ba-400f-b0f8-08fb9e8bb74a",
   "metadata": {},
   "source": [
    "## Setup\n",
    "\n",
    "We setup our open-source components.\n",
    "1. Sentence Transformers\n",
    "2. Llama 2\n",
    "3. We initialize postgres and wrap it with our wrappers/abstractions."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "63935557-a11c-4a22-9248-9c746cc89c4c",
   "metadata": {},
   "source": [
    "#### Sentence Transformers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b7108d3f",
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install llama-index-readers-file\n",
    "%pip install llama-index-vector-stores-postgres\n",
    "%pip install llama-index-embeddings-huggingface\n",
    "%pip install llama-index-llms-llama-cpp"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4c08162e-5a48-424c-921f-c9e84a59c72f",
   "metadata": {},
   "outputs": [],
   "source": [
    "# sentence transformers\n",
    "from llama_index.embeddings.huggingface import HuggingFaceEmbedding\n",
    "\n",
    "embed_model = HuggingFaceEmbedding(model_name=\"BAAI/bge-small-en\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "df10089c-917e-4191-a718-0ef7149a6a1e",
   "metadata": {},
   "source": [
    "#### Llama CPP\n",
    "\n",
    "In this notebook, we use the [`llama-2-chat-13b-ggml`](https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML) model, along with the proper prompt formatting. \n",
    "\n",
    "Check out our [Llama CPP guide](https://gpt-index.readthedocs.io/en/stable/examples/llm/llama_2_llama_cpp.html) for full setup instructions/details."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "85f8a556-9f37-42a3-a88a-f688ad355ee5",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Requirement already satisfied: llama-cpp-python in /Users/jerryliu/Programming/gpt_index/.venv/lib/python3.10/site-packages (0.2.7)\n",
      "Requirement already satisfied: numpy>=1.20.0 in /Users/jerryliu/Programming/gpt_index/.venv/lib/python3.10/site-packages (from llama-cpp-python) (1.23.5)\n",
      "Requirement already satisfied: typing-extensions>=4.5.0 in /Users/jerryliu/Programming/gpt_index/.venv/lib/python3.10/site-packages (from llama-cpp-python) (4.7.1)\n",
      "Requirement already satisfied: diskcache>=5.6.1 in /Users/jerryliu/Programming/gpt_index/.venv/lib/python3.10/site-packages (from llama-cpp-python) (5.6.3)\n",
      "\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip available: \u001b[0m\u001b[31;49m22.3.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.2.1\u001b[0m\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n"
     ]
    }
   ],
   "source": [
    "!pip install llama-cpp-python"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3cb975f6-c192-4a26-ae50-e9a319d2a66b",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.llms.llama_cpp import LlamaCPP\n",
    "\n",
    "# model_url = \"https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML/resolve/main/llama-2-13b-chat.ggmlv3.q4_0.bin\"\n",
    "model_url = \"https://huggingface.co/TheBloke/Llama-2-13B-chat-GGUF/resolve/main/llama-2-13b-chat.Q4_0.gguf\"\n",
    "\n",
    "llm = LlamaCPP(\n",
    "    # You can pass in the URL to a GGML model to download it automatically\n",
    "    model_url=model_url,\n",
    "    # optionally, you can set the path to a pre-downloaded model instead of model_url\n",
    "    model_path=None,\n",
    "    temperature=0.1,\n",
    "    max_new_tokens=256,\n",
    "    # llama2 has a context window of 4096 tokens, but we set it lower to allow for some wiggle room\n",
    "    context_window=3900,\n",
    "    # kwargs to pass to __call__()\n",
    "    generate_kwargs={},\n",
    "    # kwargs to pass to __init__()\n",
    "    # set to at least 1 to use GPU\n",
    "    model_kwargs={\"n_gpu_layers\": 1},\n",
    "    verbose=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ba02cfe2-8b51-4e01-a840-d6508c76ade3",
   "metadata": {},
   "source": [
    "#### Initialize Postgres\n",
    "\n",
    "Using an existing postgres running at localhost, create the database we'll be using.\n",
    "\n",
    "**NOTE**: Of course there are plenty of other open-source/self-hosted databases you can use! e.g. Chroma, Qdrant, Weaviate, and many more. Take a look at our [vector store guide](https://gpt-index.readthedocs.io/en/stable/module_guides/storing/vector_stores.html).\n",
    "\n",
    "**NOTE**: You will need to setup postgres on your local system. Here's an example of how to set it up on OSX: https://www.sqlshack.com/setting-up-a-postgresql-database-on-mac/.\n",
    "\n",
    "**NOTE**: You will also need to install pgvector (https://github.com/pgvector/pgvector).\n",
    "\n",
    "You can add a role like the following:\n",
    "```\n",
    "CREATE ROLE <user> WITH LOGIN PASSWORD '<password>';\n",
    "ALTER ROLE <user> SUPERUSER;\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "347fbc8c-776a-43df-add5-8ead2b1a6e44",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install psycopg2-binary pgvector asyncpg \"sqlalchemy[asyncio]\" greenlet"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8e730f40-5aa4-4e89-b4ff-26d84183dfed",
   "metadata": {},
   "outputs": [],
   "source": [
    "import psycopg2\n",
    "\n",
    "db_name = \"vector_db\"\n",
    "host = \"localhost\"\n",
    "password = \"password\"\n",
    "port = \"5432\"\n",
    "user = \"jerry\"\n",
    "# conn = psycopg2.connect(connection_string)\n",
    "conn = psycopg2.connect(\n",
    "    dbname=\"postgres\",\n",
    "    host=host,\n",
    "    password=password,\n",
    "    port=port,\n",
    "    user=user,\n",
    ")\n",
    "conn.autocommit = True\n",
    "\n",
    "with conn.cursor() as c:\n",
    "    c.execute(f\"DROP DATABASE IF EXISTS {db_name}\")\n",
    "    c.execute(f\"CREATE DATABASE {db_name}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6ba2d25a-a0aa-4d11-93b0-6288dc008148",
   "metadata": {},
   "outputs": [],
   "source": [
    "from sqlalchemy import make_url\n",
    "from llama_index.vector_stores.postgres import PGVectorStore\n",
    "\n",
    "vector_store = PGVectorStore.from_params(\n",
    "    database=db_name,\n",
    "    host=host,\n",
    "    password=password,\n",
    "    port=port,\n",
    "    user=user,\n",
    "    table_name=\"llama2_paper\",\n",
    "    embed_dim=384,  # openai embedding dimension\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e65e0a69-0668-4df4-a809-71f38695cfea",
   "metadata": {},
   "source": [
    "## Build an Ingestion Pipeline from Scratch\n",
    "\n",
    "We show how to build an ingestion pipeline as mentioned in the introduction.\n",
    "\n",
    "We fast-track the steps here (can skip metadata extraction). More details can be found [in our dedicated ingestion guide](https://gpt-index.readthedocs.io/en/latest/examples/low_level/ingestion.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "48febfa0-6a5a-44c9-900e-4316c35d8e81",
   "metadata": {},
   "source": [
    "### 1. Load Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d9cd76dd-4db5-4012-9a46-dd9ed87b3e3b",
   "metadata": {},
   "outputs": [],
   "source": [
    "!mkdir data\n",
    "!wget --user-agent \"Mozilla\" \"https://arxiv.org/pdf/2307.09288.pdf\" -O \"data/llama2.pdf\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "640fcc02-0d50-4443-b8c7-f3953e006461",
   "metadata": {},
   "outputs": [],
   "source": [
    "from pathlib import Path\n",
    "from llama_index.readers.file import PyMuPDFReader"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d9c64ce4-b778-4f3b-bc7e-266e0e124308",
   "metadata": {},
   "outputs": [],
   "source": [
    "loader = PyMuPDFReader()\n",
    "documents = loader.load(file_path=\"./data/llama2.pdf\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9c001b7c-3e79-4d11-bd0e-dc774da25de1",
   "metadata": {},
   "source": [
    "### 2. Use a Text Splitter to Split Documents"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c8125e1e-097a-4588-a65a-102dba5b8eff",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core.node_parser import SentenceSplitter"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f2519370-c907-4c46-9afb-a1295f69dbb0",
   "metadata": {},
   "outputs": [],
   "source": [
    "text_parser = SentenceSplitter(\n",
    "    chunk_size=1024,\n",
    "    # separator=\" \",\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "972b4008-eb9f-48e6-95c8-1328e231f98b",
   "metadata": {},
   "outputs": [],
   "source": [
    "text_chunks = []\n",
    "# maintain relationship with source doc index, to help inject doc metadata in (3)\n",
    "doc_idxs = []\n",
    "for doc_idx, doc in enumerate(documents):\n",
    "    cur_text_chunks = text_parser.split_text(doc.text)\n",
    "    text_chunks.extend(cur_text_chunks)\n",
    "    doc_idxs.extend([doc_idx] * len(cur_text_chunks))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5aaae403-d46e-450b-b19f-27ca66e28f1c",
   "metadata": {},
   "source": [
    "### 3. Manually Construct Nodes from Text Chunks"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ee261129-0f56-4672-9804-a38ea05244cb",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core.schema import TextNode\n",
    "\n",
    "nodes = []\n",
    "for idx, text_chunk in enumerate(text_chunks):\n",
    "    node = TextNode(\n",
    "        text=text_chunk,\n",
    "    )\n",
    "    src_doc = documents[doc_idxs[idx]]\n",
    "    node.metadata = src_doc.metadata\n",
    "    nodes.append(node)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "65eac30b-27ec-4206-947c-81104dc8babe",
   "metadata": {},
   "source": [
    "### 4. Generate Embeddings for each Node\n",
    "\n",
    "Here we generate embeddings for each Node using a sentence_transformers model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "82ef7573-a608-420d-91ab-9fbf17af3e9d",
   "metadata": {},
   "outputs": [],
   "source": [
    "for node in nodes:\n",
    "    node_embedding = embed_model.get_text_embedding(\n",
    "        node.get_content(metadata_mode=\"all\")\n",
    "    )\n",
    "    node.embedding = node_embedding"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e7fe354a-4f87-4b13-aae4-f7b7b1fe118e",
   "metadata": {},
   "source": [
    "### 5. Load Nodes into a Vector Store\n",
    "\n",
    "We now insert these nodes into our `PostgresVectorStore`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "44add008-616b-4e47-8f61-553befeb7ca4",
   "metadata": {},
   "outputs": [],
   "source": [
    "vector_store.add(nodes)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d574a062-1900-4b74-be9a-6248ffb8bbbe",
   "metadata": {},
   "source": [
    "## Build Retrieval Pipeline from Scratch\n",
    "\n",
    "We show how to build a retrieval pipeline. Similar to ingestion, we fast-track the steps. Take a look at our [retrieval guide](https://gpt-index.readthedocs.io/en/latest/examples/low_level/retrieval.html) for more details!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cd7a7465-6a3a-4379-8ac9-1dad7d8441e2",
   "metadata": {},
   "outputs": [],
   "source": [
    "query_str = \"Can you tell me about the key concepts for safety finetuning\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "20b076aa-4ebb-4f26-8b31-387a01a47405",
   "metadata": {},
   "source": [
    "### 1. Generate a Query Embedding"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "159bda45-beb5-48bc-bf33-b9d1c44188b9",
   "metadata": {},
   "outputs": [],
   "source": [
    "query_embedding = embed_model.get_query_embedding(query_str)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cf2c8594-bc95-41b0-a0bb-35b4f02a734f",
   "metadata": {},
   "source": [
    "### 2. Query the Vector Database"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b5c62ffc-2092-4fd0-85c4-acde0b6c3b4f",
   "metadata": {},
   "outputs": [],
   "source": [
    "# construct vector store query\n",
    "from llama_index.core.vector_stores import VectorStoreQuery\n",
    "\n",
    "query_mode = \"default\"\n",
    "# query_mode = \"sparse\"\n",
    "# query_mode = \"hybrid\"\n",
    "\n",
    "vector_store_query = VectorStoreQuery(\n",
    "    query_embedding=query_embedding, similarity_top_k=2, mode=query_mode\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "06051002-db44-404b-a946-2b37b3b6ca67",
   "metadata": {},
   "outputs": [],
   "source": [
    "# returns a VectorStoreQueryResult\n",
    "query_result = vector_store.query(vector_store_query)\n",
    "print(query_result.nodes[0].get_content())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dff153ee-5774-4374-98a6-038775fb1d6a",
   "metadata": {},
   "source": [
    "### 3. Parse Result into a Set of Nodes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9efda092-69ee-404e-8667-28e866c0e4d1",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core.schema import NodeWithScore\n",
    "from typing import Optional\n",
    "\n",
    "nodes_with_scores = []\n",
    "for index, node in enumerate(query_result.nodes):\n",
    "    score: Optional[float] = None\n",
    "    if query_result.similarities is not None:\n",
    "        score = query_result.similarities[index]\n",
    "    nodes_with_scores.append(NodeWithScore(node=node, score=score))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "445ee65d-bd12-46e5-817d-e21d97718338",
   "metadata": {},
   "source": [
    "### 4. Put into a Retriever"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f817dbf4-926c-4aa2-a3b6-946c45df0893",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core import QueryBundle\n",
    "from llama_index.core.retrievers import BaseRetriever\n",
    "from typing import Any, List\n",
    "\n",
    "\n",
    "class VectorDBRetriever(BaseRetriever):\n",
    "    \"\"\"Retriever over a postgres vector store.\"\"\"\n",
    "\n",
    "    def __init__(\n",
    "        self,\n",
    "        vector_store: PGVectorStore,\n",
    "        embed_model: Any,\n",
    "        query_mode: str = \"default\",\n",
    "        similarity_top_k: int = 2,\n",
    "    ) -> None:\n",
    "        \"\"\"Init params.\"\"\"\n",
    "        self._vector_store = vector_store\n",
    "        self._embed_model = embed_model\n",
    "        self._query_mode = query_mode\n",
    "        self._similarity_top_k = similarity_top_k\n",
    "        super().__init__()\n",
    "\n",
    "    def _retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:\n",
    "        \"\"\"Retrieve.\"\"\"\n",
    "        query_embedding = embed_model.get_query_embedding(\n",
    "            query_bundle.query_str\n",
    "        )\n",
    "        vector_store_query = VectorStoreQuery(\n",
    "            query_embedding=query_embedding,\n",
    "            similarity_top_k=self._similarity_top_k,\n",
    "            mode=self._query_mode,\n",
    "        )\n",
    "        query_result = vector_store.query(vector_store_query)\n",
    "\n",
    "        nodes_with_scores = []\n",
    "        for index, node in enumerate(query_result.nodes):\n",
    "            score: Optional[float] = None\n",
    "            if query_result.similarities is not None:\n",
    "                score = query_result.similarities[index]\n",
    "            nodes_with_scores.append(NodeWithScore(node=node, score=score))\n",
    "\n",
    "        return nodes_with_scores"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6dbaf309-a3fb-4d01-bc7b-4efab92e6e3d",
   "metadata": {},
   "outputs": [],
   "source": [
    "retriever = VectorDBRetriever(\n",
    "    vector_store, embed_model, query_mode=\"default\", similarity_top_k=2\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "977c28a6-065a-408a-b007-e611d2d99153",
   "metadata": {},
   "source": [
    "## Plug this into our RetrieverQueryEngine to synthesize a response"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ae459e87-daff-433d-a2bd-fd4a934357ef",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core.query_engine import RetrieverQueryEngine\n",
    "\n",
    "query_engine = RetrieverQueryEngine.from_args(retriever, llm=llm)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "312bdaba-6a91-4601-96f9-c64d1ae09007",
   "metadata": {},
   "outputs": [],
   "source": [
    "query_str = \"How does Llama 2 perform compared to other open-source models?\"\n",
    "\n",
    "response = query_engine.query(query_str)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "72f1096f-96b6-4a90-8524-b9ebe4532661",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " Based on the results shown in Table 3, Llama 2 outperforms all open-source models on most of the benchmarks, with an average improvement of around 5 points over the next best model (GPT-3.5).\n"
     ]
    }
   ],
   "source": [
    "print(str(response))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f7838289-5a59-4042-b4b7-037f66d99be4",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(response.source_nodes[0].get_content())"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "llama_index_v2",
   "language": "python",
   "name": "llama_index_v2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
